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This  paper  focuses  on  recent  work  which  analyzes  the  expectation  maximiza¬ 
tion  (EM)  evolution  of  mixtures  based  estimators.  The  goal  of  this  research  is 
the  development  of  effective  visualization  techniques  to  portray  the  mixture 
model  parameters  as  they  change  in  time.  This  is  an  inherently  high  dimen¬ 
sional  process.  Techniques  are  presented  which  portray  the  time  evolution  of 
univariate,  bivariate,  and  trivariate  finite  and  adaptive  mixtures  estimators. 
Adaptive  mixtures  is  a  recently  developed  variable  bandwidth  kernel  estima¬ 
tor  where  each  of  the  kernels  is  not  constrained  to  reside  at  a  sample  loca¬ 
tion.  The  future  role  of  these  techniques  in  developing  new  versions  of  the 
adaptive  mixtures  procedure  are  also  discussed, 
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A  New  Visualization  Technique  to  Study  the  Time  Evolution  of  Finite 
and  Adaptive  Mixture  Estimators 


This  paper  focuses  on  recent  work  which  analyzes  the  expectation  maximization  (EM)  evolution  oj 
mixtures  based  estimators.  The  goal  of  this  research  is  the  development  of  effective  visualization 
techniques  to  portray  the  mixture  model  parameters  as  they  change  in  time.  This  is  an  inherently 
high  dimensional  process.  Techniques  are  presented  which  portray  the  time  evolution  of  univari¬ 
ate,  bivariate,  and  trivariate  finite  and  adaptive  mixtures  estimators.  Adaptive  mixtures  is  a 
recently  developed  variable  bandwidth  kernel  estimator  where  each  of  the  kernels  is  not  con¬ 
strained  to  reside  at  a  sample  location.  The  future  role  of  these  techniques  in  developing  new  ver¬ 
sions  of  the  adaptive  mixtures  procedure  are  also  discussed, 

1:  Introduction 

Given  X  =  { jq,  x2, ...  jcn}  where  each  Jc,  is  d-dimensional  and  i.i.d.  according  to  an 
unknown  density  fQ  (x)  one  is  often  interested  in  estimating  fQ  (x)  .  This  problem  occurs 
in  such  areas  as  exploratory  data  analysis,  classification,  and  regression.  There  are  a  vari¬ 
ety  of  approaches  to  the  multivariate  density  estimation  problem  (Scott,  1992). 

An  often  used  parametric  approach  is  that  of  finite  mixtures  density  estimation 
(FMDE)  (Everitt  and  Hand,  1981)  in  combination  with  the  expectation  maximization 
(EM)  method  of  Dempster,  Laird,  and  Rubin  (1977).  One  difficulty  with  this  tactic  is  that 
one  needs  some  idea  as  to  the  appropriate  number  of  terms  in  the  mixture  model  as  well  as 
the  approximate  parameter  values.  Given  this  information  the  EM  algorithm  is  guaranteed 
to  converge  to  at  least  a  local  maxima  in  the  likelihood  surface. 

Some  of  the  previous  nonparametric  approaches  include  histograms  (Sturges,  1926), 
frequency  polygons  (Scott,  1985a),  adaptive  histograms  (Wegman,  1970),  average  shifted 
histograms  (Scott,  1985b),  and  kernel  estimators  (Silverman,  1986).  These  approaches  are 
beneficial  in  that  they  possess  nice  asymptotic  consistency  properties,  robustness  with 
regard  to  nonnormality,  and  fewer  parameters  to  estimate  which  implies  better  estimates 
in  the  finite  sample  regime.  They  are  at  a  disadvantage  as  compared  to  the  mixture  model 


approach  when  it  is  suspected  that  the  unknown  true  density  is  a  mixture  of  a  number  of 
terms  and  one  would  like  to  estimate  the  posteriori  probability  of  underlying  term  mem¬ 
bership  for  an  unlabeled  observation. 

A  recently  developed  density  estimation  technique  that  circumvents  some  of  the  prob¬ 
lems  of  the  above  techniques  is  the  adaptive  mixtures  density  estimation  (AMDE)  proce¬ 
dure  of  (Priebe,  1994).  This  procedure  is  a  blend  of  the  finite  mixtures  and  kernel 
estimator  approaches.  It  is  essentially  a  mixtures-type  approach  that  allows  for  the  cre¬ 
ation  of  new  terms  in  a  data  driven  manner.  We  have  successfully  applied  this  technique  in 
combination  with  fractal-based  features  to  the  detection  of  man-made  objects  in  land 
(Solka,  Priebe,  and  Rogers,  1992)  and  aerial  (Priebe,  Solka,  and  Rogers,  1993)  images,  the 
general  problem  of  texture  classification  (Solka,  Priebe,  and  Rogers,  1993),  and  the  mea¬ 
surement  of  breast  parenchymal  tissue  density  (Priebe  and  Solka  et  al.,  1994).  The  adap¬ 
tive  mixtures  estimator  is  asymptotically  consistent  like  the  kernel  estimator,  but  it  has  the 
added  benefit  of  creating  additional  terms  at  a  rate  which  is  considerably  less  then  the  rate 
n  creation  associated  with  the  kernel  estimator. 

An  inherent  difficulty  with  both  the  parametric  FMDE  and  the  nonparametric 

AMDE  is  understanding  the  time  evolution  of  the  system  under  the  EM  equations.  Even  in 
the  simple  FMDE  case  of  a  two  component  mixture  the  evolution  of  the  parameters  is  a 
five-dimensional  process.  The  situation  is  worse  in  the  case  of  AMDE  since  the  dimen¬ 
sionality  of  the  problem  increases  each  time  a  new  term  is  added  to  the  model.  We  will 
discuss  within  a  new  visualization  technique  that  makes  the  problem  of  understanding  this 
time  evolution  more  tractable. 

There  are  several  reasons  why  the  ability  to  monitor  this  time  evolution  is  impor¬ 
tant.  In  the  case  of  FMDE  the  nature  of  the  likelihood  surface  that  drives  this  evolution  is 
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very  poorly  understood  and  it  is  hoped  that  new  insights  into  the  nature  of  the  likelihood 
surface  can  be  obtained  through  close  monitoring  of  the  evolution  of  the  parameters.  Sec¬ 
ond  it  is  well  known  that  the  FMDE  under  the  EM  method  is  only  guaranteed  to  converge 
to  a  local  maxima  in  the  likelihood  equations  (Redner  and  Walker,  1984).  One  usually  cir¬ 
cumvents  this  difficulty  by  starting  the  mixture  model  at  a  variety  of  initial  conditions  in 
parameter  space.  Our  visualization  technique  provides  a  convenient  way  to  monitor  the 
process  so  that  one  can  restart  the  procedure  earlier. 

In  the  case  of  AMDE  even  less  is  known  about  the  behavior  of  the  system.  We 
have  used  our  visualization  techniques  to  help  expand  our  understanding  of  not  only  the 
dynamics  of  the  system,  but  also  the  character  of  the  solutions  that  the  procedure  pro¬ 
duces.  This  has  led  us  to  the  more  efficient  formulation  of  alternative  local  bandwidth  esti¬ 
mators.  Last  but  not  least  we  point  out  the  known  utility  that  visualization  techniques 
provide  with  regard  to  software  verification.  It  is  much  easier  to  validate  the  workings  of  a 
software  system  using  visualization  techniques  in  combination  with  analytical  procedures. 

In  Section  2  of  the  paper  we  present  a  quick  review  of  FMDE  and  AMDE.  This  is 
followed  by  discussions  of  some  earlier  attempts  at  visualization  of  AMDE  models.  We 
also  present  our  new  approach  for  the  visualization  of  univariate,  bivariate,  and  trivariate 
FMDE  and  AMDE.  In  Section  3  we  present  univariate,  bivariate,  and  trivariate  results 
obtained  using  this  new  visualization  procedure.  These  results  illustrate  the  utility  of  the 
procedure.  Specific  cases  are  presented  that  highlight  some  of  the  insights  that  can  be 
obtained  using  the  procedure.  The  Section  goes  on  to  explain  how  to  obtain  on-line  access 
to  the  movies  that  detail  the  examples  presented  within.  In  the  the  final  Section  we  sum- 
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marize  the  results  presented  and  look  ahead  to  future  research  efforts. 


2:  Approach 

Finite  Mixtures  Density  Estimation 

Given  an  unknown  distribution  fQ  (x)  we  seek  to  model  the  distribution  using 
/( jt;'F)  defined  by 

=  £*/*(*&).  (D 

i  =  1 

where  K  is  some  fixed  density  parameterized  by  0,,  and  'F  =  ^  7fcj,  0j,  ft2,  §2,  ^  J  • 

The  jt.""s  are  referred  to  as  the  mixing  proportions.  We  can  assume  for  much  of  what  fol¬ 
lows  that  K  is  taken  to  be  the  normal  distribution,  in  which  case  0,  becomes  {  p,-,  £,}.  In 
the  simplest  case  the  mixture  is  assumed  to  have  a  single  term  and  the  parameters  that 
must  be  estimated  are  the  mean  and  covariance  of  the  distribution. 

In  the  case  of  FMDE  we  begin  with  an  initial  guess  as  to  g  the  number  of  compo¬ 
nents  and  vy  their  parametric  values.  Given  this  initial  “guess”  xf/  is  updated  based  on  the 
iterative  EM  equations  as  follows: 
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This  is  where  x . .  is  the  estimated  posteriori  probability  that  Xj  belongs  to  term  i,  k^  is  the 
estimated  mixing  coefficient,  p.  ■  is  the  d-dimensional  estimated  mean  vector,  and  X,  is  the 


dxd  estimated  covariance  matrix  for  the  ith  term. 


Adaptive  Mixtures  Density  Estimation 

There  is  an  alternate  formulation  of  the  EM  update  equations  that  recursively  updates 
the  estimate  of  the  parameters  4*  based  on  a  single  new  observation.  This  version  provides 
the  capability  to  update  the  parameter  estimates  without  storage  of  the  data  set,  but  at  the 
cost  of  much  slower  convergence.  The  AMDE  was  first  formulated  in  terms  of  this  recur¬ 
sive  approach.  The  exact  form  of  the  update  equations  is  as  follows. 
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Here  }  is  the  estimated  posteriori  probability  of  xn  belonging  to  the  ith  term  of  the 
mixture,  fc^  x  is  the  estimated  mixing  coefficient,  +  x  is  the  d-dimensional  estimated 
mean,  and  x  is  the  dxd  estimated  covariance  matrix  of  the  ith  term. 

The  AMDE  stochastic  approximation  approach  is  to  recursively  update  ,  the  esti¬ 
mate  of  the  true  parameters  T'q,  while  simultaneously  providing  the  capability  to  expand 
the  parameter  space  'F  if  dictated  by  the  complexity  of  the  data.  We  note  that  in  the 
AMDE  case,  the  parameter  space  'F is  given  by  T*  =  ftj,  0j,  Tt2>  j  •  The  procedure 
%+1  ='¥t  +  A  UtCxt+l$t)  +B  CtCxt+x-,'Vt,t) ,  (10) 

is  used  to  recursively  update  the  density.  Here  A  =  [l  -P,(xt  +  n'F/)]  and 
B  =  P  (xt  +  i  i'Fj)  .  Pt  represents  a  possibly  stochastic  create  decision  and  takes  on  values 
0  or  1 .  Ut  updates  the  current  parameters  using  equations  (6-9)  while  Ct  adds  a  new  term 
to  the  model.  As  is  implicit  in  the  equation,  the  decision  to  add  a  new  term  is  a  function  of 
the  current  data  point,  our  current  estimation  of  the  parameters,  and  time.  The  time  depen¬ 
dence  is  important  in  those  cases  where  we  wish  to  anneal  the  probability  of  creation  as  a 
function  of  training  time. 

The  exact  nature  of  the  creation  process  is  as  follows.  The  Mahalanobis  distance 

from  the  new  observation  x,  to  each  of  the  terms  in  the  model  is  computed  using 

MHD  (0  =  (jc,-£(0  jV1(,)(VA(,))  .  If  MHD(i)>Tc  (a  create  threshold)  for  every 

->  (new)  »  .  .  .  , 

term  then  a  new  term  is  created  at  ji  -  xt ,  with  a  covariance  given  by 

L( new }  =  3^Z(l)J  and  a  mixing  coefficient  of  ft (ne" }  J  assuming*,  is  the  nth  data 

point.  3(.)  is  a  weighted  average  based  on  posteriori  probability.  We  also  note  that  the 
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mixing  coefficients  of  the  remaining  terms  are  all  equally  decremented  to  accommodate 
the  new  term. 

dF  Space  Representation 

The  challenge  is  to  design  an  effective  visualization  technique  to  monitor  the  evo¬ 
lutions  of  systems  under  either  the  finite  or  adaptive  mixtures  process.  As  indicated  previ¬ 
ously,  the  technique  needs  to  deal  with  the  inherent  high  dimensionality  of  the  problem.  In 
addition  it  needs  to  provide  a  realistic  portrayal  of  the  system. 

We  have  previously  presented  one  attempt  at  static  visualization  of  adaptive  mix¬ 
tures  models  (Priebe  and  Lorey  et  al,  1994).  This  approach  came  as  a  natural  by-product 
of  casting  a  mixture  model  within  a  Bayesian  framework.  In  this  case,  we  can  write  our 
estimate  as 

f{x)  =  jA(p,  o)dF  ,  (11) 

Q 

where  dF  is  the  measure  for  the  parameter  space.  In  the  case  of  discrete  mixtures  dF 
becomes  a  probability  mass  distribution  and  the  integral  is  converted  into  a  sum.  We  may 
represent  the  distribution  associated  with  dF  as  a  group  of  points  in  (p,G  ,7t)  space.  Our 
previous  work  rendered  the  support  of  dF  in  R“  by  plotting  a  circle  whose  radius  is  deter¬ 
mined  by  the  term’s  mixing  coefficient  and  whose  center  is  given  by  the  term’s  mean  and 
variance.  For  example  we  represent  the  two  component  mixture  f0(x)=.5*N(- 


2,.1)+.5*N(2,1)  as  follows,  please  see  Figure  1. 


_ dF  Estimate _ 

o  o 


Figure  l:dF  space  representation  of  f0(x)=.5*N(-2,.l)+.5*N(2,l). 


While  this  approach  has  the  advantage  of  truly  representing  the  support  of  the 
underlying  parametric  distribution  function  there  is  no  convenient  way  to  extend  it  to 
bivariate  and  trivariate  mixtures.  With  this  end  in  mind  we  propose  the  following 
approach. 

Univariate  Representation 

In  the  case  of  univariate  mixtures  we  represent  each  term  in  the  mixture  as  a 
magenta  ellipse  whose  major  radius  is  related  to  the  standard  deviation  of  the  term  and 
whose  center  is  given  by  the  mean  and  mixing  coefficient.  The  graphics  device  can  make 
the  ellipsoids  appear  somewhat  circular.  In  addition  since  our  goal  is  monitoring  the 
development  of  the  model  as  influenced  by  the  data,  we  also  propose  to  include  functional 
plots  of  the  underlying  density  function  from  which  the  data  was  drawn  when  available  as 
a  green  line  and  the  current  mixture  model  as  a  magenta  line  along  with  a  scatter  plot  of 
the  data  set  along  the  |ix  axis.  Returning  to  our  consideration  of  the  radius  of  the  term  we 
have  chosen  to  set  the  radius  exactly  equal  to  the  standard  deviation  of  the  term.  In  Figure 
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2  we  present  this  representation  of  f0(x)=.75*N(-2,.25)+.25*N(2,2)  as  the  true  probabil¬ 
ity  density  function  with  the  current  state  of  the  mixture  model  at  f (x)  =.5*N(- 
2,.1)+.5*N(2,1). 

Iteration  number  0 


Figure  2:  Sample  screen  snapshot  for  the  univariate  em  algo¬ 
rithm  case. 


In  the  case  of  adaptive  mixtures  we  employ  the  added  convention  of  indicating  the  latest 
term  created  by  using  a  red  ‘+’  instead  of  a  **’  at  its  center. 

Bivariate  Representation 

We  next  discuss  the  bivariate  case  which  follows  naturally  once  we  step  away 


from  the  dF  space  representation.  We  represent  each  term  in  the  mixture  as  an  ellipse 


(0  V( ')- 


»  r  (0 
x-  p 


=  1 


whose  eccentricity  is  determined  by  the  solution  of  yx  -  p.  J  1 
Hence  we  represent  the  term  as  a  magenta  ellipse  centered  at  (p(l)x,p.(l)y,7t(l))  which  resides 
in  (px,|iy,7r)  space  and  is  parallel  to  the  (px.py)  plane.  As  before  we  form  a  scattter  plot  of 
the  data  in  the  (px,py)  plane.  Figure  3  provides  an  example  plot  based  on  a  bimodal  two 
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component  mixture  whose  structure  is  given  by 
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We  have  included  100  points  drawn  from  f0  =  f  as  part  of  the  illustration. 


Iteration  Number  1 


Figure  3:  Sample  screen  snapshot  for  the  bivariate  finite  mix¬ 
tures  case. 


Once  again  we  indicate  the  presence  of  a  new  term  by  using  a  ‘+’  rather  then  a  *  at  the 
center  of  the  ellipse. 

Now  that  we  have  made  the  transition  into  three  space  a  quick  word  about  view¬ 
points  is  in  order.  We  follow  the  MATLAB  convention  and  specify  our  viewpoint  as  a  two 


vector  (<j),e)  where  <|>  is  the  rotation  angle  about  the  z  axis  measured  in  degrees  where  pos¬ 
itive  angles  (where  0  coincides  with  the  x-axis)  represent  counter  clockwise  rotation  and  0 
is  the  elevation  angle  of  the  viewing  eye  measure  with  respect  to  the  xy  plane  in  degrees. 
The  viewpoint  in  Figure  3  is  the  default  viewpoint  of  (-37.5,45). 

Trivariate  Representation 

In  this  case  each  term  is  plotted  as  an  ellipsoid  in  (px,|iy,pz)  space.  The  ellipsoid  is 
determined  by  -  |i  ^  j  £  ’  ^  ^  J  =  1  -  So  each  egg  is  plotted  at 

(|i(i)x,)j.(i)y,|i(i)2).  Since  the  trivariate  nature  of  the  term  fully  occupies  the  underlying 
dimensionality  of  the  embedding  space,  we  are  faced  with  the  question  of  how  to  repre¬ 
sent  the  mixing  coefficient  for  this  term.  We  have  chosen  to  use  the  color  of  the  egg  to 
indicate  each  term’s  mixing  coefficient.  We  present  the  color  ramp,  mappings  from  colors 
to  tc’s,  above  the  plot  for  ease  of  reference  by  the  user.  We  have  chosen  not  to  scatter  plot 
the  underlying  data  in  this  case  so  as  not  to  clutter  the  graph.  Figure  4  presents  a  plot  of  the 
mixture  f  (x,  y,  z )  =  0-577  ( (-4,  -4,  -4) ,  Id)  +  0  5/7  ( (4,  4,  4) ,  Id)  where  Id  is  the 


3x3  identity  matrix. 


» 
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3:  Results 


MATLAB  Implementation 

The  visualization  code  was  initially  developed  on  a  486/33MHz  computer  using 
MATLAB  4.2.  The  code  was  then  transferred  to  a  SILICON  GRAPHICS  INDY  2  plat¬ 
form  for  further  development.  MATLAB  was  chosen  because  of  it’s  computational  capa- 
bilites  as  well  as  it’s  many  graphics  tools;  e.g.,  the  ability  to  make  movies  of  the  density 
estimation  process.  There  is  nothing  in  the  code  or  the  process  that  makes  it  machine 
dependent,  which  allows  for  a  wider  usage.  Additionally,  the  authors  took  care  to  use  only 
those  functions  that  come  with  the  MATLAB  package  itself;  i.e,  no  toolbox  functions 
were  used  in  the  implementation.  The  functions  are  written  in  a  modular  manner  for 
greater  adaptability  and  ease  of  use.  Several  switches  are  implemented  that  allow  the  user 
to  tailor  a  given  run.  For  example,  a  user  may  want  to  run  FMDE  without  graphical  output 
or  print  screen  snapshots  at  certain  iterations. 

Univariate  Results 

We  present  results  that  illustrate  the  application  of  the  procedure  to  univariate, 
bivariate,  and  tri variate  finite  and  adaptive  mixtures  models.  Each  test  case  has  been  cho¬ 
sen  to  best  illustrate  the  effectiveness  of  the  procedure.  As  can  be  expected  it  is  difficult  to 
display  what  is  a  dynamic  process  in  a  set  of  stills.  It  is  hard  to  fully  appreciate  the  process 
without  the  use  of  movies.  We  will  have  more  to  say  about  the  subject  of  movies  at  the  end 
of  this  section. 

The  first  test  case  consists  of  1000  points  drawn  from  the  mixture  f0(x)  =  .25N(- 
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6,1)  +  .25N(-2,1)+.25N(2,1)+.25N(6,1).  We  illustrate  our  technique  by  considering  the 
evolution  of  a  4  component  finite  mixture  model  under  this  data  set.  The  initial  settings  of 
the  model  are  as  follows: 

tt:1=.05,ij.1=-10,cj12=1  .3; 

7t2=.05,|i2=-5,a22=.03; 

7t3=.45,|i3=0,o32=.03; 

7t4=53,p,4=10,c42=1.3. 

This  initial  model  is  displayed  in  Figure  5. 

ItQration  number  0 


finite  mixture  test  case. 

The  top  frame  contains  a  standard  functional  representation  of  the  probability  density 
functions  for  the  mixture  model  rendered  in  magenta  and  for  the  true  model  rendered  in 
green  .  In  the  bottom  frame  each  term  in  the  model  takes  the  form  of  an  ellipse  and  the  first 
100  points  of  the  data  is  plotted  in  green  along  the  x-axis.  The  initial  configuration  of  the  x 
axis  is  data  driven  and  this  in  part  leads  to  only  a  partial  display  of  the  initial  terms.  As  is 


expected  by  the  nature  of  the  EM  algorithm  the  terms  are  ultimately  drawn  into  a  more 


close  interaction  with  the  data  and  hence  this  display  problem  is  solved. 


Figure  6  displays  the  model  after  the  first  iteration  through  the  data.  We  notice  that 


ItsraSon  number  1 


Figure  6:  The  4  mode  4  term  finite  mixtures  test  case  after  the 
first  iteration  through  the  data. 


m  there  has  been  a  large  adjustment  on  the  parameters  at  the  end  of  the  first  step.  This  is  not 

surprising  given  how  far  “offtrack”  the  parameters  initially  started.  As  will  be  seen,  sub- 
sequent  frames  will  indicate  that  this  initial  adjustment  is  much  larger  then  the  later  ones 
and  is  suggestive  of  the  steep  nature  of  the  likelihood  surface  at  the  perimeter.  These  types 
of  insights  are  one  of  the  benefits  of  the  visualization  process.  Figures  7  a,  b,  and  c  por- 

•  trays  the  solution  at  10,  25,  and  50  iterations  respectively.  The  final  parameters  in  the 

model  are  given  by: 

Tt1=.2476,p1=-2.1143,a12=1.1239; 

7t2=. 2440,|i2=-6 T21  6,o22=.97  1  3 ; 

7C3=.2747,M-3=1 .9936,a32=l  .6668; 

•  7t4=.2338„(i4=6. 0583, o42=. 8504. 
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In  this  case  the  EM  method  is  converging  to  the  correct  solution.  We  conclude  our  analy¬ 
sis  of  this  example  with  a  plot  of  the  trajectories  of  the  system  in  parameter  space,  see  Fig¬ 
ure  8.  In  this  case  we  use  a  coordinate  system  given  by  (jix,tc).  The  time  evolution  of  the 


Figure  7:  The  system  after  (a)  10,  (b)  25,  and  (c)  50  iterations. 


Figure  8:  Phase  space  trajectories  for  the  4  term  FMDE  case. 

We  next  turn  our  attention  to  a  univariate  case  for  the  adaptive  mixtures  estimator. 
In  this  case  our  sample  is  100  points  drawn  from  fo(x)=.5N(-2,l)+-5N(2,l)-  Figure  9  dis¬ 
plays  the  state  of  the  system  after  the  first  data  point.  As  promised,  the  model  consists  of  a 
single  term  centered  at  this  data  point.  Figures  9  b  and  c  show  the  state  of  the  system  after 
the  second  and  third  data  point.  We  notice  that  the  second  point  fell  within  the  support 
region  of  the  first  term  and  hence  the  model  was  updated  using  the  recursive  update  equa¬ 
tions  and  no  term  creation  took  place.  A  new  term  is  created  after  the  third  point.  Figures 
10  a,  b,  and  c  show  the  state  of  the  system  after  25,  50,  and  100  data  points  respectively. 
We  notice  the  good  fit  between  the  adaptive  mixtures  model  and  the  underlying  probabil¬ 
ity  distribution  at  this  time. 
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We  conclude  our  univariate  examples  by  applying  the  adaptive  mixtures  procedure 
to  a  100  data  point  set  drawn  from  the  four  mode  four  term  distribution  of  our  first  exam¬ 
ple.  We  present  this  to  illustrate  the  different  character  of  the  solutions  computed  using  the 
two  procedures.  Figure  11  illustrates  the  AMDE  solution  after  the  last  data  point.  At  this 
time,  there  is  still  a  fairly  good  fit  between  the  overdetermined  mixture  model  and  the  true 
distribution.  The  overdetermined  nature  of  the  solution  is  a  small  price  to  pay  when  one 
considers  that  the  model  was  produced  without  an  initial  estimate  of  the  number  of  terms 
in  the  model  or  their  position.  In  fact  compared  with  the  equivalent  kernel  estimator  which 
contains  100  terms  the  AMDE  is  quite  frugal. 


Data  number  100 


Figure  11:  State  of  the  AMDE  after  presentation  of  100 
points  from  a  4  mode  4  term  distribution. 


Bivariate  Results 


We  next  turn  our  attention  to  two  bivariate  examples.  In  the  first  one  we  consider 


100  points  drawn  from  a  two  component  mixture  given  by 


f0(x,y)  =  0-3iV 


(-3,-3), 


1.  0.B 
0.8  1. 


+  0.1  N 


(3,3), 


V 


1.  0. 

0.  1. 


(13) 


We  first  consider  a  two  component  finite  mixtures  solution  based  on  this  data  set.  The  ini 
tial  model  is  given  by 


fo  (*>  y) 


( 

r  -1 

\ 

( 

r  -i  \ 

0.5  N 

(-5,  -5), 

1.  0. 

+  0.5V 

(5,5), 

1.  0. 

V. 

o.  i. 

) 

0.  lj. 

(14) 


In  Figures  12  a  and  b  we  present  the  initial  configuration  of  the  model  and  the  model  after 
7  iterations  through  the  data.  We  notice  the  close  match  between  the  final  configuration  of 


the  mixture  model  and  the  true  distribution.. 


Figure  12  :  (a)  Initial  and  (b)  final  configuration  of  two  term  finite  mixture 
solution.  View  of  [-37.5,45]  and  [0,90]  have  been  presented  in  each  case. 
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Once  again  it  is  interesting  to  compare  the  nature  of  the  finite  mixtures  solution  to 
that  obtained  using  the  adaptive  mixtures  procedure.  In  Figure  13  we  portray  the  final  con¬ 
figuration  of  the  adaptive  mixtures  solution  based  on  100  points  drawn  from  the  above  dis¬ 
tribution.  This  solution  which  consists  of  8  terms  was  obtained  using  a  create  threshold 
Tc=l.532=2.34.  This  value  was  chosen  to  match  the  value  of  1  used  in  the  univariate  simu¬ 
lations.  We  draw  particular  attention  to  the  manner  in  which  the  estimator  has  modeled  the 
leftmost  region  that  contains  the  correlation.  We  see  the  terms  placed  end  to  end  along  the 
thin  ridge.  It  is  interesting  to  compare  this  with  the  long  narrow  term  obtained  by  the  finite 
mixtures  estimator.  Once  again  we  see  the  utility  of  the  visualization  process 


Figure  13:  Final  configuration  of  the  eight  term  adaptive  mixtures  so¬ 
lution. 


In  Figures  14  a  and  b  we  present  two  views  of  the  probability  density  function  that  results 
from  this  mixture.  We  see  from  view  (a)  that  the  relative  heights  of  the  two  peaks  seem 


appropriate  with  respect  the  underlying  mixing  proportions.  In  view  (b)  we  clearly  see  the 
correlated  and  uncorrelated  peaks  in  the  probability  density  function. 


Figure  14:  (a)  and  (b)  Two  views  of  the  pdf  corresponding  to  the  solution 
of  Figure  13. 


Trivariate  Results  9 

The  final  case  involves  the  adaptive  mixtures  analysis  of  a  bimodal  trivariate  data 

set.  The  data  set  consists  of  100  points  drawn  from 


f0(x,y,z )  =  +  0.5N ( (3,  3,  3) ,  S2) 


(15) 
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1.0  -0.8  0 

-0.8  1.0  0  -In  Figures 
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where  and  Iq  are  given  by  = 

1.0  0.8  0 

0.8  1.0  0 

and  = 

i 

o 

o 

lO_ 

15  a  and  b  we  present  the  solution  after  25,  and  100  points  respectively.  A  create  threshold 
of  tc=1.882=3.54  was  used.  This  value  reflects  the  appropriate  normalization  for  dimen¬ 
sionality.  As  indicated  previously  each  term  in  the  model  is  represented  as  a  ellipsoid 
where  the  ellipsoid  is  determined  by  the  covariance  structure  of  the  term.  We  notice  the 
correlation  structure  of  the  data  clearly  indicated  by  the  AMDE  model. 

r\-.»  ...-kuM  OC  Tamm  -  » 


On-line  Access  of  Movies 


Movies  for  each  of  these  cases  that  we  have  discussed  can  be  accessed  via  our  on¬ 
line  MOSAIC  server  at  irisd.nswc.navy.mil  (128.38.40.50).  Some  background  discussion 
and  mpeg  movies  are  provided  for  each  case.  The  reader  is  encouraged  to  view  these 
movies  in  order  to  obtain  a  full  appreciation  of  the  process.  In  addition,  the  movies  in 
MPEG  and  MATLAB  format  are  available  via  anonymous  FTP  from  irisd. 

4:  Conclusion 

The  EM  algorithm  can  be  used  to  perform  maximum  likelihood  based  estimation 
of  unknown  probability  distributions.  This  estimation  can  take  the  form  of  the  parametric 
finite  mixtures  procedure  or  the  semi-parametric  adaptive  mixtures  procedure.  In  either 
case,  the  time  evolution  of  these  systems  can  be  very  difficult  to  follow. 

We  have  developed  a  new  visualization  technique  to  aid  in  the  study  of  the  time 
evolution  of  these  parametric  and  nonparametric  estimators  in  time.  This  technique  makes 
use  of  graphical  abstractions  of  the  mixture  model  structure.  We  have  found  this  procedure 
useful  in  gaining  insights  into  the  inner  workings  of  both  the  finite  mixtures  and  adaptive 
mixtures  procedure. 

We  have  also  provided  access  to  the  movies  produced  using  these  visualization 
techniques.  We  plan  to  use  these  techniques  to  aid  in  our  future  research  and  pedagogical 
efforts.  Some  of  our  future  research  efforts  will  focus  on  the  development  of  new  adaptive 
bandwidth  estimators  that  use  alternate  create  criteria  and  estimation  procedures. 
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